Compiler with inter-modular procedure optimization

ABSTRACT

A compiler method is adapted to be executed by a computer with limited memory, yet enables cross-CU optimization during the conversion of a source code listing to an object code listing. The compiler method includes the steps of: converting plural source code listings into plural CUs, each CU being an intermediate code representation; analyzing each CU and deriving a global CU table which includes a reference to each analyzed CU; a program symbol table which indicates in which CU each program routine is defined and/or referred to; and a global call graph which notes each routine in each CU, indicates references therebetween, and further indicates where the routine exists in the program symbol table. The method further derives a CU symbol table which includes information that includes a reference for each routine defined in a CU to the intermediate representation for that routine. The method compiles the CUs by analyzing each CU and employing at least the global call graph and program symbol table to enable cross CU relationships to be taken into account and utilized in arranging an improved object code representation of the source code listing. CUs which are being operated upon are stored in uncompressed form; whereas other CUs may be stored in either uncompressed form, compressed form or off-line on a disk memory.

FIELD OF THE INVENTION

This invention relates to a source code compiler which operates uponmodules of source code listings and converts them into executable objectcode and, more particularly, to a source code compiler which enablesinter-modular optimization of procedures present therein.

BACKGROUND OF THE INVENTION

At present, there are two common steps involved in constructing anapplication which will run on a computer. The first step is thecompilation phase which accomplishes a translation of the source code toa set of object files written in machine language. The second step isthe link phase which combines the set of object files into an executableobject code file. Almost all code generation and optimization decisionsare made during the compilation phase and the link phase primarilyrelocates code and data, resolves branch addresses and provides bindingto run-time libraries.

Today, most modern programming languages support the concept of separatecompilation, wherein a single computer source code listing is broken upinto separate modules that can be fed individually to the languagetranslator that generates the machine code. This separation actionallows better management of the program's source code and allows fastercompilation of the program. The separate code modules will hereafter bereferred to synonymously as either "modules" or "compilation units"(CUs).

The use of CUs during the compilation process enables substantialsavings in required memory in the computer on which the compilerexecutes. However, such use limits the level of application performanceachieved by the compiler. For instance, optimization actions that aretaken by a compiler are generally restricted to procedures containedwithin a CU, with the CU barrier limiting the access of the compiler toother procedures in other CUs. This limitation is of significance whenattempting to accomplish either in-lining or cloning, as the selectionof call-sites is restricted at which these optimizations can beperformed.

In-lining replaces a call site with the called routine's code. In-linesubstitution serves at least two purposes: it eliminates call overheadand tailors the call to the particular set of arguments passed at agiven call site. Cloning replaces a call site with a call to aspecialized version of the original called procedure. Cloning allows forconstant arguments to be propagated into the cloned routine. Morespecifically, cloning a procedure results in a version of the calledprocedure that has been tailored to one or more specific call sites,where certain variables are known to be constant on entry.

Importantly, modular handling of routines by the compiler creates abarrier across which information, which could be of use to the compiler,is invisible.

It has been recognized in the prior art that making cross-modularinformation available during the compilation action will improveapplication performance. Thus, a compiler which can see across modularbarriers can achieve significant benefits of inter-proceduraloptimization and achieve noticeable gains in performance of theresulting application.

Loeliger et al. in a paper entitled "Developing an Inter-proceduralOptimizing Compiler", ACM SIGPLAN Notices, Vol 29, No. 4. April 1994,pp41-48, describe how a compiler developed for use in the C-seriesSupercomputers (marketed by the Convex Computer Corporation) enablesinter-procedural optimization. Initially, a series of passes are madeover a database that contains information about all of the procedures inthe application. A number of analyses are performed to provideinformation (where traditional compilers make worst-case assumptions).For instance, the database is analyzed to determine which procedures areinvoked by a call (call analysis); which names refer to a same location(alias analysis); which pointers point to which locations (pointertracking); which procedures use which scalars (scalar analysis); whichprocedure should be in-lined at which call sites (inline analysis);etc., etc.

The results of these analyses, i.e. a "profile feedback", are thenemployed during the compile action to achieve application improvement.Little description is made available by Loeliger et al. regarding howthe actual "build" process utilizes the profile feedback informationachieved during the database analysis. Further, the Loeliger et al.process is not compatible with a widely used "make" utility, availablein many operating systems. For instance, in the UNIX operating system,the "make" utility enables the construction of a make file to enablechanges to be placed into a program listing. The make file includescommands which perform as little work as possible, i.e., only convertingthe new changes to object code. The make utility then links the oldcompiled code with just the overwritten new object code and avoids thenecessity of having to recompile the entire code listing.

It is important that any new compiler be compatible with the makeutility. Further, it is important that any new compile procedure be ableto run at a reasonable speed, given the limited levels of memoryavailable on personal computer and work station-style processors.

Accordingly, there is a need for an improved compiler which enablescross-CU optimization and is compatible with the make utility. Further,there is a need for an improved compiler which enables cross-CUoptimization, while keeping compile time short and minimizing theamounts of required memory for execution of the optimization procedure.

SUMMARY OF THE INVENTION

A compiler method is adapted to be executed by a computer with limitedmemory, yet enables cross-CU optimization during the conversion of asource code listing to an object code listing. The compiler methodincludes the steps of: converting plural source code listings intoplural CUs, each CU being an intermediate code representation; analyzingeach CU and deriving a global CU table which includes a reference toeach analyzed CU; a program symbol table which indicates in which CUeach program routine is defined and/or referred to; and a global callgraph which notes each routine in each CU, indicates referencestherebetween, and further indicates where the routine exists in theprogram symbol table. The method further derives a CU symbol table whichincludes information that includes a reference for each routine definedin a CU to the intermediate representation for that routine. The methodcompiles the CUs by analyzing each CU and employing at least the globalcall graph and program symbol table to enable cross CU relationships tobe taken into account and utilized in arranging an improved object coderepresentation of the source code listing. CUs which are being operatedupon are stored in uncompressed form; whereas other CUs may be stored ineither uncompressed form, compressed form or off-line on a disk memory.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of functions performed in a prior artcompiler, wherein cross-CU optimization is not available.

FIG. 2 is a block diagram of compiler functions performed in theinvention hereof which enable cross-CU optimization.

FIG. 2a is a block diagram of a system for executing the compiler ofFIG. 2.

FIG. 3 is a schematic representation of four code listings which areserially input into the compiler of FIG. 2.

FIG. 4 is a schematic diagram illustrating various tables and listingswhich are provided during the action of the system of FIG. 2 and whichenable cross-CU optimization.

FIG. 5 is a logical flow diagram which illustrates an example of thecompilation action performed by the invention hereof.

DETAILED DESCRIPTION OF THE INVENTION

Referring to FIG. 1, a conventional compiler system is illustrated. Asource listing (i.e., Fortran, C++, ADA, or some other source code) isfed to a front end converter 12 which converts the received sourcelisting into an intermediate code representation (IR). As each source CUis fed to front end converter 12, the output is an IR code listing ofthe source code lines within the inputted CU. That CU is then fed to anoptimizer 14 which performs optimizations on the received IR code toimprove its execution efficiency.

As above indicated, optimizer 14 generally applies its optimizationactions to each CU, per se, and does not provide cross-CU optimizationactions. Thereafter, the optimized CU (still in IR form) is fed to aback end processor 16 which outputs an object code listing for theentire CU. Thereafter, the object code listing is fed to a linker 18which connects the various CUs into an executable form of object codefor the computer on which the code is to execute.

In FIG. 2, a block diagram of a compiler which incorporates theinvention. Source code CUs are fed to front end processor 20 whichconverts the input source code modules to corresponding IR object codeCUs which maintain the source code module boundaries. Accordingly, theIR object code CUs still remain compatible with the make utilitydescribed above.

Each CU is fed to a linker 22 which determines if the input code is inIR form or in object code form. The IR form code is fed to optimizer 24which establishes a number of tables that are utilized to achieve aprogram-wide optimization, notwithstanding the segmentation of theprogram input to CU size blocks. Thereafter, the optimized code listingfor each CU is fed to back end processor 26 which, in turn, outputsobject code CUs that are then linked by linker 27 into a continuous formexecutable code.

In FIG. 2a, a system for executing the compiler of FIG. 2 isillustrated. A computer 30 includes a central processing unit (CPU) 31which is coupled via a bus system to a random access memory (RAM) 32, adisk drive 33 and a read-only memory (ROM) 34. A memory cartridge 35 isemployed to insert a source listing into computer 31 and, further, mayalso be used to insert a compiler routine which incorporates theinvention hereof.

RAM 32, as an example, provides temporary storage for a plurality ofcode listings that are utilized during the operation of the invention.Source listing 24 comprises a set of files including a plurality ofroutines to be run in the course of execution of the program defined bysource listing 36. A compiler 37 is employed to convert source listing36 into machine executable object code 38 (that is further stored in RAM32). Compiler 37 includes a listing for optimizer procedure 24 andfurther includes a number of tables that will be discussed in respect toFIG. 4, i.e. global compilation unit table 40, program symbol table 42,global call graph table 44 and a compilation unit symbol table 54 for atleast one compilation unit.

The remaining description will concern the operation of optimizer 24,wherein progam-wide optimization is achieved, while accomplishingconservation of memory facilities.

As will be hereafter understood, the method performed by the inventionemploys a hierarchical representation of the program which allowsdifferent sized CUs to be manipulated for best memory resource usage.The CUs are configured in either a fully expanded form code that ismaintained in virtual memory (e.g., in RAM); in a compressed form storedin virtual memory; or as compressed CUs that are written to filescontained in disk memory. The method enables the CUs to be read in fromdisk and uncompressed only when needed.

Referring to FIG. 3, a plurality of CUs, i.e., CU1; CU2; CU3; etc. areshown. These CUs are serially fed to front end converter 20, areconverted to IR object code form and are then passed to linker 22. CU1and CU2 include many lines of source code; however, only several areshown in the Fig. and these will be used for explanation purposes duringthe description below of the files which are created and the interactionthereof with the procedures executed by optimizer 24.

For instance, CU1 includes routine A () which includes a call to routineB (). The format of CU1 is written in C++ form with "A ()" and "B ()"indicating the names of the routines and following enclosures {}indicating that code is contained therebetween which defines therespective routine. In CU2, routine B () is defined and its actual codelisting is found between enclosures 28 and 29. In CU1, only the name B() is present.

Turning now to FIG. 4, the files and tables established by optimizer 24will be described. As the CU IR object code is received, threeprogram-wide listings are established as follows. A global CU table(GCUT) 40 includes a pointer to each received CU, along with anindication of the state of the respective CU. More specifically, eachGCUT entry includes an indication as to whether the CU is in expandedform, has been compressed or is on disk. Each GCUT entry for a CUfurther includes a pointer to the address where the CU can be found invirtual memory (if present there).

GCUT 40 and the additional program-wide tables/listings are maintainedin persistent storage and remain there for the entire duration of thecompilation action. As will be hereafter understood, individual CUs arepositioned in transitory store where they may be either in fullyexpanded form; in compressed form (both in virtual memory); or offloadedonto a disk file.

A program symbol table (PST) 42 is built as the CU IR object code isreceived. PST 42 lists each routine in the received CU and indicateswhether the routine is defined in the CU and/or whether the routine isjust referred to in the CU and its actual definition resides elsewhere.For instance, when CU1 is received and routine A () is encountered, itis determined that A() is defined in CU1 and this information is loadedinto PST 42. The same action occurs when B () is encountered; however,by its placement in the code listing, it is determined that routine B ()is merely referred to in CU1. (Note, that for the location of definitionof B () to be determined, the program must await the input of CU2).

As each entry for a routine is made in PST 42, a pointer 43 isassociated therewith which points to the CUs which either define theroutine or refer to the routine, as the case may be.

As each routine is encountered in the incoming object code data stream,a global call graph (GCG) 44 is constructed. A node, e.g., 46, 48, . . ., is assigned as each routine is identified. Thus, when routine A () isencountered in the incoming code stream, GCG 44 has node 46 placedtherein with a pointer to the corresponding entry in PST 42. However, asno further information is yet available for routine A (), no linkingsare established therefor. When routine B () is encountered, a node 48 isestablished in GCG 44 and an edge 50 is created indicating that routineA () includes a call to routine B (). A pointer is also established fromnode 48 and indicates where in PST 42 an entry for routine B () can befound.

When node 48 is initially established, the "referred-to" entry in PST 42is determined from the code in CU1; however, information regarding whereroutine B () is defined is not yet present. That information arriveswhen CU2 is received with its indication of the definition of routine B().

Within the transitory store portion of virtual memory, received CU's areinitially stored in expanded form until the amount of virtual memoryallocated to expanded form CU storage reaches a threshold level.Thereafter, received CU's are stored in virtual memory in compressedform and if further virtual memory is not available therefor, areoff-loaded onto a disk file where they can be accessed by a call fromoptimizer 24.

As each CU is received, it is initially stored in expanded form as shownat 52. A CU symbol table (CUST) 54 is established for each received CUand includes listings of variables, routines, types and constantscontained within the respective CU. For instance, CUST 54 for CU1includes a listing for routine A () with a pointers to the actual IRcode 56 for routine A (). In similar fashion, the variables entry inCUST 54 includes variables 58 which are referred to in the IR for CU1.

A routine's IR code may refer to its CUST in several places; however,the CUST has only one reference to the routine's IR (i.e., a singlehandle representing the routine). This structure allows individualroutines to be compressed or removed from virtual memory when they arenot needed by optimizer 24 and then later recalled and decompressed byusing information stored in the handle. A CUST is related to entries inPST 42 in a similar manner. A CUST may refer to PST 42 in severalplaces, however, PST 42 has only one reference to the CUST through asingle handle representing the CUST. This structure allows individualCUST's to be compressed or removed from virtual memory when none of theroutines that are listed therein are needed by optimizer 24. They may bere-expanded in memory by using the single handle.

As regards GCG 44, the nodes therein refer only to symbols stored in PST42 and not to the intermediate representation or any entities in theCUST's. This allows the full GCG to be visible to optimizer 24 for theperformance of in-lining or inter-procedural information propagationwithout having the fully expanded program representation in virtualmemory. The representations of PST 42 and GCG 44 are minimal and muchsmaller than the sum of the intermediate representation and the CUST'sfor the entire program. Only as much of an intermediate representationor as many of the CUST's as are needed for a particular optimization(e.g., in-lining) need to be expanded to their largest form at the sametime.

Turning now to FIG. 5, the procedure followed by the invention will bedescribed. Initially, CU1 (in IR object code form) is read from frontend converter 22 (box 80). In response, optimizer 24 commences buildingGCUT 40, PST 42, GCG 44 and CUST 54. Further, the expanded form of CU1is stored in virtual memory.

When routine A () is reached, its code listing is analyzed (box 82) andthe following entries are made in PST 42: "A (): defined in 1" and "B(): referred to in 1". Further, nodes A and B are entered in GCG 44 andan edge is created between nodes A and B indicating that a call is madefrom node A to node B. In addition, links are established for both nodesA and B back to the corresponding entries in PST 42.

Thereafter, assuming that CU1 has now been read into transitory store(i.e., virtual memory) in expanded form, CU2 is read in (box 84). Anentry is made for CU2 in GCUT 40. In addition, when routine B () isreached, an entry is made PST 42 indicating as follows: "B (): definedin 2". Thereafter, the reading-in of CU2 is continued until the entiretyof CU2 has been read in and analyzed.

During this time, if a memory threshold value for expanded form data isreached, the oldest read-in CUs, for example, are converted to acompressed form and are maintained in virtual memory until a furtherthreshold is reached, at which time they are off-loaded onto a disk filein its compacted form. Other removal/replacement procedures can also beemployed.

Once all CU's have been received, analyzed and stored (box 86), a codeoptimization procedure commences (box 88) wherein in-lining, cloning,etc. is performed on succeeding CUs. Certain compilation actions whichdo not require cross-CU access can be performed while individual CUs arebeing read-in. Further code optimization actions, requiring cross-CUaccess, can also be accomplished while a read-in is being performed ormay wait until all CUs have been received and stored.

A significant benefit is achieved during the compilation action byin-lining of code in the CUs. A decision tree is utilized in determiningwhich calls should be replaced by in-line listings and which should not.For instance, GCG 44 is examined to determine all edges between allnodes, as each edge represents a call between nodes. From suchexamination, it is determined how often each call is made. Then, it isdetermined which call sites can support an in-lining action. Allinformation required to make this determination is represented in theGCG, so during this determination none of the CUs need to be in expandedform. For instance, if a call is required between routines in differentCUs, and the compile assumptions were different for the different CUs,then the choice is generally made not to perform in-lining--to avoid thecomplexities of inserting additional data regarding the differentcompile assumptions. Using this type of analysis and others that areknown to those skilled in the art, edges in GCG 44 which will notsupport an in-lining action are eliminated from further consideration.

Next, the in-lining procedure focuses on the remaining call sites anddetermines a benefit value to be gained by in-lining at each. As aresult of the initial analysis of GCG 44, the number of calls made fromeach site to each other site have been accumulated. For each site, a"cost estimate" is then made for each call, i.e., how much will the codeat the site grow if an in-lining action is performed with respect tothat site.

A benefit value is then determined by deriving a ratio of the number ofcalls made by a call site and the cost number derived for the call site.When all sites have been so analyzed, a routine is run which enablesselection of those call sites evidencing a benefit value which exceeds apredetermined threshold. That threshold is preferably a limit value onthe amount of code growth at a call site. The routine analyzes thebenefit value with respect thereto--to come out with a listing of callsites which are to be in-lined, within the limits of available memoryallocated to the code.

Thereafter, the in-lining action is executed and the relevant codelistings are inserted in place of the specific call sites. The GCG andPST are then updated to accurately reflect the new program state. Forinstance, wherever a called routine has been in-lined, there is nolonger a call and the corresponding edge in the GCG is removed.

During the read-in action to the compiler, generally only the CU beingread-in is maintained in expanded form. Earlier received CUs arecompressed and maintained in virtual memory until a threshold memorylevel is reached, at which point they are off-loaded onto disk.Thereafter, during the optimization procedure, the CU being optimized ismaintained in expanded form in virtual memory and, during in-lining, aseach call site is reached that refers to a routine in another CU, thatCU is accessed from disk (assuming it is not already in virtual memory);decompressed and utilized for the optimization action.

From the above, it can be seen that the procedure of the inventionallows processing of large amounts of code and data by a compiler on amachine with limited amounts of memory resources. Further, the procedureis compatible with the prior art "make" utility that is widely used bysoftware developers. The procedure includes different levels of memoryuse optimization and is dynamically scalable, depending upon the user'sdevelopment environment. The data representations lend themselves toefficient use of memory, because large portions of the data can becompacted, removed from virtual memory and then off-loaded to disk.

It should be understood that the foregoing description is onlyillustrative of the invention. Various alternatives and modificationscan be devised by those skilled in the art without departing from theinvention. Accordingly, the present invention is intended to embrace allsuch alternatives, modifications and variances which fall within thescope of the appended claims.

We claim:
 1. A compiler method for converting a source code listing comprising plural code modules, to an object code listing, said method adapted to be executed by a computer with limited memory resources, said method comprising the steps of:converting said source code listing into plural compilation units (CUs), each CU being an intermediate code listing corresponding to a code module of said source code listing; analyzing each CU and deriving:global CU table (GCUT) means for including a reference to each analyzed CU and an indication of whether or not said each analyzed CU is stored in compressed form; program symbol table (PST) means for indicating in which CU, each program routine is defined and/or referred to; global call graph (GCG) means for noting each program routine in each CU, indicating references therebetween, and further indicating where said each program routine is present in said PST; and CU symbol table (CUST) means, for each routine listed in a CU, for including a reference to a location where the routine can be found in memory; and compiling received CUs by employing at least data from said GCG means and PST means in analysis of each CU to enable cross-CU relationships to be taken into account during the analysis and utilized in arranging an object code representation of said source code listing.
 2. The method as recited in claim 1, including the further steps of:storing said GCUT means, PST means and GCG means in virtual memory in said computer; storing at least a further CUST means and associated CU listing in uncompressed form in said virtual memory and storing another CUST means and associated CU listing in compressed form; and decompressing said another CUST means and associated CU upon determining during said compiling step of a reference thereto from a CU being optimized, said determining at least based upon data accessed from said GCG means and PST means.
 3. The method as recited in claim 2, further including the further steps of:storing a still further CUST means and associated CU in a disk store; and accessing from said disk store said still further CUST means and associated CU and storing a decompressed form thereof in virtual memory upon determining during said compiling step of a reference to said still further CUST means and associated CU from a CU being optimized.
 4. The method as recited in 1, wherein said GCUT means further includes for each CU, information indicating whether said each CU is resident in uncompressed form in virtual memory, is resident in compressed form in virtual memory; or is resident in a disk store in compressed form, said method employing said information to determine if either a decompression of said each CU or a reading of said each CU from disk to virtual memory is required.
 5. The method as recited in claim 1, wherein each CU corresponds to a source code file.
 6. The method as recited in claim 1, wherein said compiling step comprises the further step of:determining from said cross-CU relationships which call sites in one CU should be replaced by a called program routine resident in a second CU.
 7. A memory media including code for enabling a computer with limited memory resources to perform a compiler method which converts a source code listing to an object code listing, said memory media comprising:a) means for controlling said computer to convert said source code listing into plural compilation units (CUs), each CU being an intermediate code listing of a corresponding module of said source code listing; b) means for controlling said computer to analyze each CU and to derive:global CU table (GCUT) means for including a reference to each analyzed CU and an indication of whether or not said each analyzed CU is stored in compressed form; program symbol table (PST) means for indicating in which CU, each program routine is defined and/or referred to; global call graph (GCG) means for noting each program routine in each CU, indicating references therebetween, and further indicating where said each program routine is present in said PST means; and CU symbol table (CUST) means for each CU which, for each routine listed in a CU, includes a reference to a location where the routine can be found in memory; and c) means for controlling said computer to compile received CUs by employing at least data from said GCG means and PST means in analysis of each CU to enable cross-CU relationships to be taken into account during the analysis and utilized in arranging an object code representation of said source code listing.
 8. The memory media recited in claim 7, further comprising:means for controlling said computer to store said GCUT means, PST means and GCG means in said virtual memory; means for controlling said computer to store at least a further CUST means and associated CU listing in uncompressed form in said virtual memory and to store another CUST means and associated CU listing in compressed form; and means for controlling said computer to decompress said another CUST means and associated CU upon determining a presence of reference thereto from a CU being optimized, said determining at least based upon data accessed from said GCG means and PST means.
 9. The memory media recited in claim 8, further comprising:means for controlling said computer to store a still further CUST means and associated CU in a disk store; and means for controlling said computer to access from said disk store said still further CUST means and associated CU and to store a decompressed form thereof in virtual memory upon determining a presence of a reference to said still further CUST means and associated CU from a CU being optimized.
 10. The memory media as recited in 7, wherein said GCUT means further includes for each CU, information indicating whether said each CU is resident in uncompressed form in virtual memory, is resident in compressed form in virtual memory; or is resident in a disk store in compressed form, said means c) employs said information to determine if either a decompression of said each CU or a reading of said each CU from disk to virtual memory is required.
 11. The memory media recited in claim 7, wherein each CU corresponds to a source code file.
 12. The memory media recited in claim 7, wherein said means c) determines from said cross-CU relationships which call sites in one CU should be replaced by a called program routine resident in a second CU and performs a replacement accordingly. 